Authored by Rice Paddies
# libraries
from IPython.display import Image
from IPython.display import HTML
from sympy import Matrix
from scipy.linalg import lu
import numpy as np
import pandas as pd
We denote a set with an upper case italic letter such as A. In the context of linear algebra, we say that a line is a set of points, and the set of all lines in the plane is a set of sets. Similarly we can say that vectors are sets of points while matrices are sets of vectors.
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture0.PNG')
A subset can be viewed as a filtered down set. Take as an example the set of all UFC cagefighters that I’ll denote as U. I can filter for “z is Asian” . This assertion or condition S(x) is true for some members within the set of all UFC fighters and false for others.
Therefore, such a sentence, evaluated for all member of U, generates a subset: the set of all Asian UFC fighters
The vertical bar (|) reads as “such that”. Therefore, we can read the above expression as: all elements of z in U such that z is Asian. And that’s how we obtain the set B.
An ordered pair is denoted as (X,Y), with X as the first coordinate and Y as the second coordinate. A valid ordered pair has the property that (X,Y) ≠ (Y,X). Unordered pairs do not hold this property: X,Y = Y,X
In set theory, relations are defined as sets of ordered pairs and denoted as R: xRy.
For any z ∈ R, there exist x and y such that z = (x,y).
Domain and Range:
f: X → Y or f(x) = y
Matrices can be thought of as function action on vectors or other matrices. In simplistic terms, machine learning can be described as the transformations or mappings from the domain onto the range of a function. The domain X is usually a vector (or set) of variables mapping onto a vector of target values.
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture1.PNG')
Three types of vectors:
Code for 3x1 matrix:
x=np.array([[5], [10], [15]])
print(x)
print ('--------')
print(x.shape)
[[ 5] [10] [15]] -------- (3, 1)
Code for 3x3 matrix:
x = np.array([[5, 6, -2], #row 1
[1, 0, 926], #row 2
[-3, 4, 6]]) #row 3
print (x)
[[ 5 6 -2] [ 1 0 926] [ -3 4 6]]
x = np.array([[3], [6], [0]])
y = np.array([[1], [5], [6]])
x+y
array([[ 4],
[11],
[ 6]])
# alternative
np.add(x,y)
array([[ 4],
[11],
[ 6]])
x = np.array([[0,2],
[1,4]])
y = np.array([[3,1],
[-3,2]])
x+y
array([[ 3, 3],
[-2, 6]])
x = np.array([[4], [-9]])
print (x)
print ('-------')
# multiply with scalar
print (10*x)
[[ 4] [-9]] ------- [[ 40] [-90]]
Step by Step Math Problem Example:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture1.5.PNG')
# set 2x2 matrix
x = np.array([[4, 0], [1, -9]])
print (x)
# multiply with scalar
2*x
[[ 4 0] [ 1 -9]]
array([[ 8, 0],
[ 2, -18]])
Linear Combinations: taking the fundamental elements of a matrix (i.e. set of vectors) to generate a new object
There are only two legal operations with vectors in linear algebra:
Step by Step Math Problem Example:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture2.PNG')
x = np.array([[4, 2]])
y = np.array([[5, 1]])
print(x)
print ('-----------')
print(y)
print ('-----------')
print (3*x + -10*y)
[[4 2]] ----------- [[5 1]] ----------- [[-38 -4]]
In machine learning, unless made explicit, we can safely assume that an inner product refers to the dot product.
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture2.3.PNG')
Step by Step Math Problem Example:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture2.5.PNG')
To multiply two vectors with dimensions (2x1) in Numpy, we need to transpose the first vector using the @ operator:
x, y = np.array([[3], [-4]]), np.array([[6], [-6]])
print(x)
print ('-----------')
print(y)
# transpose matrix x
print ('-----------')
print(x.T)
# dot product
x.T @ y
[[ 3] [-4]] ----------- [[ 6] [-6]] ----------- [[ 3 -4]]
array([[42]])
Step by Step Math Problem Example:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture3.PNG')
# matrices in proper shape for dot product
x = np.array([[1, 2, 3], [4, 5, 6]])
y = np.array([[7, 8], [9, 10], [11,12]])
print(x)
print ('-----------')
print(y)
print ('-----------')
#dot product
print(np.dot(x,y))
[[1 2 3] [4 5 6]] ----------- [[ 7 8] [ 9 10] [11 12]] ----------- [[ 58 64] [139 154]]
Vector space: set of proper vectors and all possible linear combinatios of the set.
Consider the vectors x and y and the scalars α and β. If we take all possible linear combinations of αx+βy we would obtain the span of such vectors. If our vectors x and y point into different directions in the 2-dimensional space, we get that the span(x,y) is equal to the entire 2-dimensional plane.
What would happen if the vectors point in the same direction? Vectors span a line as shown in the left-most image below. When two variables are “colinear” they are pointing in the same direction, hence they provide redundant information, so can drop one without information loss.
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture3.5.PNG')
A vector subspace (or linear subspace) is a vector space that lies within a larger vector space. Consider a subspace S. For a vector to be a valid subspace it has to meet three conditions:
Think of closure as being unable to “jump out” from one dimensional space into another. A pair of vectors laying flat in the 2-dimensional space cannot (by either addition or multiplication) “jump out” into the 3-dimensional space.
A set of vectors is linearly dependent if at least one vector can be obtained as a linear combination of other vectors in the set. A set of vectors is linearly independent if no vector can be obtained as a linear combination of other vectors in the set. From a machine learning perspective, linearly dependent vectors contain redundant information whereas linearly independent vectors do not.
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture4.PNG')
Step by Step Math Problem Example:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture8.PNG')
Null space of a set of vectors: all linear combinations that “map” into the zero vector (0,0)
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture5.PNG')
Vector Norm: (absolute value) length of vector as distance between its “origin” and its “end”
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture10.PNG')
Manhattan Norm ($L_{1}$): gets its name in analogy to measuring distances while moving in Manhattan, NYC. Since Manhattan has a grid-shape, the distance between any two points is measured by moving in vertical and horizontals lines (instead of diagonals as in the Euclidean norm).
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture11.PNG')
x = np.array([[3],[-4]])
np.linalg.norm(x, ord = 1)
7.0
*Euclidean Norm ($L_{2}$): euclidean norm is one of the most popular norms in machine learning. Referred to as “the norm” of a vector. Distance stands for the euclidean distance or $L_2$ norm unless otherwise noted.
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture12.PNG')
Step by Step Math Problem Example:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture71.PNG')
# euclidean norm / magnitude of vector in 4 dimensions
u = np.array([[1],[2], [-4], [1]])
np.round(np.linalg.norm(u, 2), 2)
4.69
Step by Step Math Problem Example:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture72.PNG')
# euclidean distance calculation in 3 dimensions
u = np.array([3, -6, 2])
v = np.array([1, 2, -1])
distance = np.linalg.norm(u-v)
np.round(distance, 2)
8.77
Max Norm: absolute value of the largest element in the vector
x = np.array([[3], [-4], [1/20], [-6]])
np.linalg.norm(x, np.inf)
6.0
Unit Vector: has length of 1; the unit vectors [0,1] and [1,0] can form together any other vector
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture69.PNG')
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture68.PNG')
# unit vector
u = np.array([[2], [3], [5]])
print (u)
print ('----------------')
u_hat = u * (1/(np.linalg.norm(u, 2))) # denominator is Euclidean norm
print (np.round((u_hat), 3))
[[2] [3] [5]] ---------------- [[0.324] [0.487] [0.811]]
Step by Step Math Problem Example:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture13.PNG')
Compute the cos(θ) value between a pair of vectors:
x = np.array([[3], [4]])
y = np.array([[4], [3]])
print(x)
print('-----')
print (y)
# solve for cos(θ)
cos_theta = np.dot(x.T, y) / (np.linalg.norm(x,2)*np.linalg.norm(y,2))
cos_theta
print('-----')
print(f'cos(θ) = {np.round(cos_theta, 3)}')
# solve for θ value
theta_rad = np.arccos(cos_theta)
print('-----')
print(f'θ = {np.round(theta_rad, 3)} radians')
# convert angle from radians to degrees
theta_deg = theta_rad * (180/np.pi)
print('-----')
print(f'θ = {np.round(theta_deg, 3)} degrees')
[[3] [4]] ----- [[4] [3]] ----- cos(θ) = [[0.96]] ----- θ = [[0.284]] radians ----- θ = [[16.26]] degrees
Orthogonality: perpendicularity of vectors in any number of dimensions; used interchangeably with “independence”
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture15.PNG')
# are the two vectors below orthogonal?
x = np.array([[4], [0]])
y = np.array([[0], [4]])
cos_theta = np.dot(x.T, y) / (np.linalg.norm(x,2) * np.linalg.norm(y,2))
print(f'cos(θ) = {np.round(cos_theta)} so x and y vectors are orthogonal to each other')
cos(θ) = [[0.]] so x and y vectors are orthogonal to each other
For matrices, there is no such thing as division. You can add, subtract, and multiply matrices but you cannot divide them. Also, matrix to matrix multiplication is not commutative meaning that multiplication order matters!
Identity Matrix: an identity matrix is a square matrix with ones on the diagonal from the upper left to the bottom right and zeros everywhere else. We denote the identity matrix as $I_{n}$. We define $I∈R^{n×n}$ as:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture17.PNG')
# 2x2 identity matrix
I_2x2 = np.identity(2)
print (I_2x2)
print ('------------')
# 3x3 identity matrix
I_3x3 = np.identity(3)
print (I_3x3)
print ('------------')
# 4x4 identity matrix
I_4x4 = np.identity(4)
print (I_4x4)
[[1. 0.] [0. 1.]] ------------ [[1. 0. 0.] [0. 1. 0.] [0. 0. 1.]] ------------ [[1. 0. 0. 0.] [0. 1. 0. 0.] [0. 0. 1. 0.] [0. 0. 0. 1.]]
Inverse Matrix : matrix that when multiplied with another matrix (from either the right or the left side) returns the identity matrix. Matrices that don't have an inverse are called noninvertible or singular matrices.
Since matrices cannot be divided, inversion can be used instead since multiplying by 1/3 is the same as dividing by 3.
Step by Step Math Problem Example:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture18.PNG')
# 2x2 matrix example
z = np.array([[4, 3],
[3, 2]])
print (z)
print ('----------')
# calculate inverse matrix
z_inv = np.linalg.inv(z)
print(z_inv)
# check if inverse matrix is correct
I = np.round(z_inv @ z) # matrix multiplication order mattered and using np.round() instead of np.dot() mattered
print ('----------')
print (I)
[[4 3] [3 2]] ---------- [[-2. 3.] [ 3. -4.]] ---------- [[1. 0.] [0. 1.]]
# 3x3 matrix example
z = np.array([[1, 4, 8],
[2, 1, 5],
[1, 7, 6]])
print (z)
print ('----------')
# calculate inverse matrix
z_inv = np.linalg.inv(z)
print(z_inv)
# check if inverse matrix is correct
I = np.round(z_inv @ z, 2) # matrix multiplication order mattered and using np.round() instead of np.dot() mattered
print ('----------')
print (I)
[[1 4 8] [2 1 5] [1 7 6]] ---------- [[-0.61702128 0.68085106 0.25531915] [-0.14893617 -0.04255319 0.23404255] [ 0.27659574 -0.06382979 -0.14893617]] ---------- [[ 1. 0. 0.] [ 0. 1. -0.] [ 0. -0. 1.]]
Transpose Matrix : matrix that when multiplied with another matrix from either the right or left side returns the identity matrix.
Consider a matrix $M∈R^{m×n}$. The transpose of M is denoted as $M^T∈R^{n×m}$.
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture19.PNG')
A = np.array([[3, -4],
[2, 0],
[-1, 6],
[5, 1]])
# Use T method to transpose matrix
A.T
array([[ 3, 2, -1, 5],
[-4, 0, 6, 1]])
Step by Step Math Problem Example:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture25.PNG')
A = np.array([[3,5,7], [4, 9, 8]])
B = np.array([[1, 6, 3], [0, 2, 9]])
# Use multiply() or * to do hadamard product
A*B
array([[ 3, 30, 21],
[ 0, 18, 72]])
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture91.PNG')
A matrix is said to be in echelon form when it has undergone the process of Gaussian Elimination. Specifically:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture27.PNG')
Design matrix: special name for matrices containing explanatory variables or features in the context of statistics and machine learning
The purpose of linear algebra is to solve systems of linear equations. Informally, this means to figure out the right combination of linear segments to obtain a desired outcome.
Matrices are used to denote systems of linear equations. Given matrix A and vectors v and y in $∈R^3$. Setup the system of linear equations as Av = y:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture28.PNG')
Geometrically the solution is equal to the line segment where the three planes intersect below:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture29.PNG')
Represent the system as a linear combination of the column vectors times a scaling term:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture31.PNG')
Gaussian Elimination: robust algorithm to solve linear systems by eliminating terms from a system of equations, such that it is simplified to the point where we obtain the row echelon form of the matrix. Sequential application of three elementary transformations is needed:
A consistent system of linear equations has at least one solution versus inconsistent systems that have no solution.
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture34.PNG')
Gauss-Jordan Elimination: only difference between Gaussian Elimination and Gauss-Jordan Elimination is that this time we “keep going” with the elemental row operations until we obtain the reduced row echelon form
Step by Step Math Problem Example:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture35.PNG')
A = np.array([[1,2,2], [1,3,3], [2,6,5]])
y = np.array([4, 5, 6])
# np.linalg.solve() solves Ax=y equations
display(A)
display(y)
np.linalg.solve(A, y)
array([[1, 2, 2],
[1, 3, 3],
[2, 6, 5]])
array([4, 5, 6])
array([ 2., -3., 4.])
What are all possible subspaces that can be “covered” by a collection of vectors in a matrix? There are four fundamental subspaces that can be “covered” by a matrix of valid vectors:
These subspaces are considered fundamental because they express many important properties of matrices in linear algebra.
Column Space C(A): composed of all linear combinations of the columns of matrix A. So C(A) is equal to the span of the columns of A. For a matrix $A∈R^{m×n}$ and a vector $v∈R^m$, the column space is defined as:
Formula in words: all linear combinations of the column vectors of A and entries of a n dimensional vector v
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture32.PNG')
Row Space R(A): composed of all linear combinations of the rows of matrix A. In other words, R(A) is equal to the span of the rows of A. A different way to see the row space, is by transposing $A^T$. Now, we can define the row space simply as $R(A^T)$.
For a matrix $A∈R^{m×n}$ and a vector $v∈R^m$, the column space is defined as:
Formula in words: all linear combinations of the row vectors of A and entries of a m dimensional vector w
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture33.PNG')
Null Space N(A): null space of a matrix A is composed of all vectors that map into the zero vector when multiplied by A. For a matrix $A∈R^{m×n}$ and a vector $v∈R^m$, the column space is defined as:
Null Space of the Transpose $N(A^T)$: left null space of a matrix A is composed of all vectors that map into the zero vector when multiplied by A from the left. “From the left” means the vectors are on the left side of matrix A.
For a matrix $A∈R^{m×n}$ and a vector $v∈R^m$, the column space is defined as:
Matrix Basis: a set of n linearly independent column vectors with n elements forms a basis. To find the basis of a matrix, find out which vectors are linearly independent of each other (i.e. pivots of the echelon form indicate the set of linearly independent vectors in a matrix).
NumPy does not have a method to obtain the row echelon form of a matrix. SymPy does using rref() method.
A = Matrix([[1, 2, 3, -1],
[2, -1, -4, 8],
[-1, 1, 3, -5],
[-1, 2, 5, -6],
[-1, -2, -3, 1]])
A
Matrix([ [ 1, 2, 3, -1], [ 2, -1, -4, 8], [-1, 1, 3, -5], [-1, 2, 5, -6], [-1, -2, -3, 1]])
A_rref, A_pivots = A.rref()
A_rref
Matrix([ [1, 0, -1, 0], [0, 1, 2, 0], [0, 0, 0, 1], [0, 0, 0, 0], [0, 0, 0, 0]])
print(f'pivot columns indexes are: {A_pivots}')
pivot columns indexes are: (0, 1, 3)
Standard Basis : a special orthonormal vector basis in which each basis vector has a single nonzero entry with value 1.
In $R^2$ space, the standard basis is:
In $R^3$ space, the standard basis is:
Matrix Rank: maximal number of linearly independent columns of a matrix; from an applied machine learning perspective, the rank of a matrix is relevant as a measure of the information content of the matrix
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture36.PNG')
Matrix Norm: measure the size of a matrix by computing its norm. Three of the most commonly used norms in machine learning are:
Frobenius Norm: think about this norm as flattening out the matrix into a long vector. For instance, a 3×3 matrix would become a vector with n=9 entries.
Formula in words: square each entry of matrix A, add them together, and then take the square root.
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture37.PNG')
A = np.array([[-4, 2, 0],
[4, 1, 7],
[-1/2, 4, 10]])
np.round(np.linalg.norm(A, 'fro'), 3)
14.221
Max Norm / Infinity Norm: equals largest sum of the absolute value of maximal row vector
Formula in words: equals going row by row, adding the absolute value of each entry, and then selecting the largest sum by row.
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture38.PNG')
A = np.array([[-4, 2, 0],
[4, 1, 7],
[-1/2, 4, 10]])
np.linalg.norm(A, np.inf)
14.5
Spectral Norm: equals to the largest singular value $σ_1$; related to eigenvectors and eigenvalues
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture39.PNG')
A = np.array([[-4, 2, 0],
[4, 1, 7],
[-1/2, 4, 10]])
np.round(np.linalg.norm(A, 2), 2)
12.94
Linear Mappings / Linear Transformations / Linear Functions: indicates the correspondence between vectors in a vector space V and the same vectors in a different vector space W
Linear mappings transform vector spaces into other vector spaces. Such transformations must be linear. Consider a linear mapping T and a pair of vectors x and y. To be valid, a linear mapping must satisfy these rules:
In linear algebra, linear mappings are represented as matrices and performed by matrix multiplication. Take a vector x and a matrix A. We say that when A multiplies x, the matrix transforms the vector into another vector: T(x) = Ax
Typical notation for a linear mapping: T:V → W ... (for the vector spaces V and W)
Dot products are linear mappings. Two simple cases:
1. negation
2. reversal
Examples below test this for one vector, but mapping works on the entire vector space (i.e. the span) of a given dimensionality.
Negation Matrix: returns the opposite sign of each element of a vector
First test linear mapping: T(x+y) = T(x)+T(y)
x = np.array([[-1],
[0],
[1]])
y = np.array([[-3],
[0],
[2]])
# negation matrix
T = np.array([[-1,0,0],
[0,-1,0],
[0,0,-1]])
# @ is dot product
T@(x+y)
array([[ 4],
[ 0],
[-3]])
T@x + T@y
array([[ 4],
[ 0],
[-3]])
Second test T(αx) = αT(x), ∀α:
alpha = 10
T@(alpha*x)
array([[ 10],
[ 0],
[-10]])
alpha*(T@x)
array([[ 10],
[ 0],
[-10]])
Reversal Matrix: reverses the order of the elements of a vector; the last become the first, the second to last becomes the second, and so on
First test T(x+y) = T(x)+T(y):
x = np.array([[-1],
[0],
[1]])
y = np.array([[-3],
[0],
[2]])
# reversal matrix
T = np.array([[0,0,1],
[0,1,0],
[1,0,0]])
T@x
array([[ 1],
[ 0],
[-1]])
T@y
array([[ 2],
[ 0],
[-3]])
T@(x+y)
array([[ 3],
[ 0],
[-4]])
T@x + T@y
array([[ 3],
[ 0],
[-4]])
Second test T(αx) = αT(x), ∀α:
alpha = 10
T@(alpha*x)
array([[ 10],
[ 0],
[-10]])
alpha*(T@x)
array([[ 10],
[ 0],
[-10]])
There are several important linear mappings (or transformations) that can be expressed as matrix-vector multiplications of the form y=Ax. Such mappings are common in image processing, computer vision, and other linear applications.
Combinations of linear and nonlinear mappings are what complex models such as neural networks do to learn mappings from inputs to outputs. Six important linear mappings:
Scaling: mapping of the form y=Ax, with A=αI
Using $s_1$ and $s_2$ as scaling factors, a scaling matrix in $R^2$ takes the form:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture46.PNG')
# let matrix A be the scaling matrix
A = np.array([[5,0], [0,5]])
print (A)
print ('---------')
x = np.array([[0,2], [0, 4]])
print (x)
print ('---------')
y = A @ x
print (y)
[[5 0] [0 5]] --------- [[0 2] [0 4]] --------- [[ 0 10] [ 0 20]]
Reflection: mirror image of an object in Euclidean space; reflection of a vector x through a line that passes through the origin is:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture47.PNG')
Special cases of reflection:
# reflection along the horizontal axis, or around the line at 0 degrees from the origin
A1 = np.array([[1.0, 0],
[0, -1.0]])
# reflection along the vertical axis, or around the line at 90∘ from the origin
A2 = np.array([[-1.0, 0],
[0, 1.0]])
# reflection along the line where the horizontal axis equals the vertical axis, or around the line at 45 degrees from the origin
A3 = np.array([[0, 1.0],
[1.0, 0]])
# Reflection along the line where the horizontal axis equals the negative of the vertical axis, or around the line at −45 degrees from the origin
A4 = np.array([[0, -1.0],
[-1.0, 0]])
x = np.array([[0,2], [0, 4]])
print (x)
[[0 2] [0 4]]
y1 = A1 @ x
y2 = A2 @ x
y3 = A3 @ x
y4 = A4 @ x
y1
array([[ 0., 2.],
[ 0., -4.]])
y2
array([[ 0., -2.],
[ 0., 4.]])
y3
array([[0., 4.],
[0., 2.]])
y4
array([[ 0., -4.],
[ 0., -2.]])
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture48.PNG')
Shear: displaces points of an object in a given direction (i.e. all points to the right) in a proportion equal to their perpendicular distance from an axis that remains fixed; a “proportion equal to their perpendicular distance” means that points further away from the reference axis displace more than points near to the axis.
Having m as shear factor, for an object in $R^2$, a horizontal shear matrix takes the form:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture49.PNG')
Having m as shear factor, for an object in $R^2$, a vertical shear matrix takes the form:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture50.PNG')
# shear along the horiontal axis
A = np.array([[1.0, 1.5],
[0, 1.0]])
print (A)
print ('-------------')
v = np.array([[2, 4,],
[0, 4]])
print (v)
[[1. 1.5] [0. 1. ]] ------------- [[2 4] [0 4]]
y = A @ v
y
array([[ 2., 10.],
[ 0., 4.]])
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture51.PNG')
Rotation: move objects counterclockwise in Euclidean space
For the general case in $R^2$, counterclockwise of vector x by θ radian rotations is obtained as:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture52.PNG')
Special cases of rotation:
# 90-degrees roration
A1 = np.array([[0, -1.0],
[1, 0]])
# 180-degrees roration
A2 = np.array([[-1.0, 0],
[0, -1.0]])
# 270-degrees roration
A3 = np.array([[0, 1.0],
[-1.0, 0]])
x = np.array([[0, 2.0,],
[0, 4.0,]])
y1 = A1 @ x
y2 = A2 @ x
y3 = A3 @ x
y1
array([[ 0., -4.],
[ 0., 2.]])
y2
array([[ 0., -2.],
[ 0., -4.]])
y3
array([[ 0., 4.],
[ 0., -2.]])
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture53.PNG')
Projections: mappings from a space onto a subpace or from a set of vectors onto a subset of vectors;
For a vector space V and a vector subset U ⊂ V, we define a projection ϕ as:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture54.PNG')
Step by Step Math Problem Example:
# find the projection Pϕ from vector b onto a basis vector a
# base vector
a = np.array([[5],
[1]])
b = np.array([[1],
[4]])
# projection matrix
P = (a @ a.T) / (a.T @ a)
P
array([[0.96153846, 0.19230769],
[0.19230769, 0.03846154]])
# apply projection matrix to vector x
b_pred = b.T @ P
b_pred
array([[1.73076923, 0.34615385]])
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture56.PNG')
General Subspace Projections: a group of people projecting themselves onto someone else, with that person representing a rough approximation of the character of the group.
For set of basis vectors $a_1$ ... $a_m$, where A is the matrix of basis vectors:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture57.PNG')
Formula in words: take the sum of the product between each basis vector and α; we want the projection to be the minimal distance from vector b onto A, which we know implies orthogonal lines connecting vector b with matrix A
We know the values inside matrix A and vector b already so what we want to solve for is α (expression known as Moore–Penrose inverse of A):
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture58.PNG')
Can be used to solve linear regression problems, although common notation is: $α = (X^TX)^{−1} X^Ty$
Step by Step Math Problem Example:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture59.PNG')
# vector b
b = np.array([3,-1,5])
# matrix A
A = np.array([
[1,2],
[-1,4],
[1,2]])
# np.linalg.solve() solves Ax=b equations
alpha = np.linalg.solve(A.T @ A, A.T @ b)
print (alpha)
print ('-------------')
Ax = A @ alpha
print(Ax.T)
[3. 0.5] ------------- [ 4. -1. 4.]
Machine learning prediction problems usually require analysts to find a solution to systems of linear equations of the form: y = Ax. In other words, to represent y as linear combinations of the columns of matrix A.
In most cases y is not in the column space of A meaning there is no way to find a linear combination of its columns to obtain the target y column. In such cases, we can use orthogonal projections to find approximate solutions to the system. We usually denote approximated solutions for systems of linear equations as y^hat.
Now, y^hat will be in the span of the columns of A (meaning there exists a linear combination of A columns to obtain predicted y^hat column) and will be the result of projecting y onto the subspace of the columns of A. This solution will be the closest approximation of y given the span of the columns of A.
Summary: The approximated solution y^hat is the orthogonal projection of y onto matrix A.
Norms: all norms are not linear transformations; See "Matrix Norm" section
Translation: geometric transformation that moves every vector in a vector space by the same distance in a given direction (i.e. move a cup of coffee from your left to your right, and you would have performed translation in $R^3$ space).
The translation matrix is represented with homogeneous coordinates instead of cartesian coordinates. The homogeneous coordinate system adds an extra 1 at the end of vectros.
# vector in R^2 cartesian coordinates
x=np.array([[3], [3]])
x
array([[3],
[3]])
# becomes the following in R^2 homogeneous coordinates
x=np.array([[3], [3], [1]])
x
array([[3],
[3],
[1]])
The translation matrix for the general case cannot be represented with Cartesian coordinates. Homogeneous coordinates are the standard in fields like computer graphics since they allow us to better represent a series of transformations (or mappings) like scaling, translation, rotation, etc.
Step by Step Math Problem Example:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture40.PNG')
# translation matrix
T = np.array([ [1, 0, 3],
[0, 1, 1],
[0, 0, 1]])
print (T)
print ('----------')
# vector x
x = np.array([ [2, 2, 1] ])
print (x.T)
T @ x.T
[[1 0 3] [0 1 1] [0 0 1]] ---------- [[2] [2] [1]]
array([[5],
[3],
[1]])
Affine Mappings: linear mapping + translation; an affine mapping M takes the form below where variable A is a linear mapping or transformation and variable b is the translation vector: M(x) = Ax+b
Linear regression is usually analyzed as a linear mapping plus noise, but it can also be seen as an affine mapping. Alternative, we can say that Ax+b is a linear mapping if and only if b=0.
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture41.PNG')
Affine Combinations of Vectors: are linear combinations with an added constraint in which the sum of weights β must equal 1. Linear combination definition (vectors $x_1$…$x_k$ and scalars $β_1$…$β_k ∈ R$):
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture42.PNG')
In practice, this defines a weighted average of the vectors. Chart below shows general consequence of constraining the sum of the weights to 1: every affine combination of the same set of vectors will map onto the same space.
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture43.PNG')
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture44.PNG')
Affine Span: set of all linear combinations of the vector set such that the weights add up to 1 and all weights are real numbers. Hence, the fundamental difference between vector spaces and affine spaces, is the former will span the entire $R^n$ space (assuming independent vectors), whereas the latter will span a line.
Three cases in $R^3$:
Affine Spaces: translations of vector spaces (vector spaces that have been offset from the origin of the coordinate system); any point, line, plane, or hyperplane in $R^n$ that does not go through the origin is an affine subspace (i.e. linear regression is an affine subspace)
Consider the matrix $A∈R^{m×n}$ and vectors x, b, $y∈R^n$
We can represent the system of linear equations Ax+b = y as a single matrix vector multiplication by using an augmented matrix known as affine transformation matrix:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture45.PNG')
Matrix Decomposition: break down a matrix into simpler elements / matrices to better understand its fundamental structure; opposite of linear combinations; multiple ways to decompose matrices
Applications of matrix factorization in machine learning include: clustering, recommender systems, dimensionality reduction, topic modeling, etc.
LU Decomposition: decompose a matrix into a lower triangular matrix (L) and an upper triangular matrix (U); can represent Gaussian Elimination (GE) in linear algebra; can be obtained from noninvertible matrices (matrix with no inverse) and from non-square matrices; in LU decomposition, use elementary matrices to perform GE style row operations
Formula (Matrix A represented as the product of lower triangular matrix L and upper triangular matrix U):
Elementary Matrices: obtaining any lower triangular matrix via simple column or row operations (i.e. addition and multiplication); elementary matrices “encode” fundamental column and row operations
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture60.PNG')
Introduction to Elementary Matrices:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture64.PNG')
Inverse is simply the opposite operation (i.e. multiply first row by 5, now divide first row by 5):
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture61.PNG')
Gaussian Elimination: robust algorithm to solve systems of linear equations (see earlier section); reduce matrices to row echelon form (upper triangular matrix) with zero rows at the bottom and zeros below the pivot for each column
Elementary matrices can cleverly organize the steps from Gaussian Elimination because you can represent row operations as multiplication by elementary matrices instead. See example below.
Step by Step Math Problem Example:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture65.PNG')
Reduce Matrix A to row echelon form by creating lower triangular matrix using elementary matrices
# notice how we are impacting the second row only
# replicating R2-2R1
A = np.array([[1, 3, 5],
[2, 2, -1],
[1, 3, 2]])
l1 = np.array([[1, 0, 0],
[-2, 1, 0],
[0, 0, 1]])
l1 @ A
array([[ 1, 3, 5],
[ 0, -4, -11],
[ 1, 3, 2]])
# need to create all 0's in lower triangle: equivalent of R3-R1
l2 = np.array([[1, 0, 0],
[-2, 1, 0],
[-1, 0, 1]])
# U = l2@A = upper triangular matrix resulting from Gaussian Elimination
U = l2 @ A
U
array([[ 1, 3, 5],
[ 0, -4, -11],
[ 0, 0, -3]])
# A = LU so we need to find L
# L is the inverse of l2
L = np.linalg.inv(l2)
L
array([[ 1., -0., -0.],
[ 2., 1., 0.],
[ 1., 0., 1.]])
# check if A = LU
L @ U
array([[ 1., 3., 5.],
[ 2., 2., -1.],
[ 1., 3., 2.]])
# confirmed that we recover A by multiplying L*U
A
array([[ 1, 3, 5],
[ 2, 2, -1],
[ 1, 3, 2]])
LU decomposition does not work when permutations of rows are required to solve a system of linear equations. Therefore, to perform all possible Gaussian Eliminations, permutation matrices (P) will need to be utilized. The complete decomposition of Matrix A is:
Step by Step Math Problem Example:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture66.PNG')
P = np.array([[0, 1],
[1, 0]])
A = np.array([[0, 1],
[1, 1]])
P @ A
array([[1, 1],
[0, 1]])
Use SciPy to perform LUP decomposition by using the linalg.lu() method:
A = np.array([[-1, 3, 1, 0],
[7, 1, 6, 1],
[2, 7, 4, -2],
[2, 3, -9, 4]])
P, L, U = lu(A)
print(f'pivot matrix P is:\n{P}')
print(f'----------------------------------')
print(f'upper triangular matrix U is:\n{np.round(U, 2)}')
print(f'----------------------------------')
print(f'lower triangular matrix L is:\n{np.round(L, 2)}')
pivot matrix P is: [[0. 0. 0. 1.] [1. 0. 0. 0.] [0. 1. 0. 0.] [0. 0. 1. 0.]] ---------------------------------- upper triangular matrix U is: [[ 7. 1. 6. 1. ] [ 0. 6.71 2.29 -2.29] [ 0. 0. -11.64 4.64] [ 0. 0. 0. 1.53]] ---------------------------------- lower triangular matrix L is: [[ 1. 0. 0. 0. ] [ 0.29 1. 0. 0. ] [ 0.29 0.4 1. 0. ] [-0.14 0.47 -0.07 1. ]]
# confirm decomposition is right by multiplying L*U*P which should return matrix A
P @ L @ U
array([[-1., 3., 1., 0.],
[ 7., 1., 6., 1.],
[ 2., 7., 4., -2.],
[ 2., 3., -9., 4.]])
# original matrix A is recovered
A
array([[-1, 3, 1, 0],
[ 7, 1, 6, 1],
[ 2, 7, 4, -2],
[ 2, 3, -9, 4]])
QR Decomposition: used to solve systems of linear equations like least square problems and to find eigenvalues of a general matrix; decomposes matrix A into an orthogonal matrix Q and a upper traingular matrix R:
Orthogonal vectors form an orthogonal basis for matrix A and for the vector space spanned by such matrix. Once you divide each vector in Matrix A by its length or norm, a unit basis vector is produced. This produces an orthonormal basis.
Normalized unit vector u ("hat" is added to vector u to signify normalized vector):
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture68.PNG')
x = np.array([[3],[4],[0]])
y = np.array([[-4],[3],[2]])
print (x)
print ('--------')
print (y)
print ('--------')
# check if vectors are orthogonal --> yes
x.T @ y
[[3] [4] [0]] -------- [[-4] [ 3] [ 2]] --------
array([[0]])
# normalize unit vectors
x_unit = x * (1/(np.linalg.norm(x, 2))) # denominator is Euclidean norm
y_unit = y * (1/(np.linalg.norm(y, 2)))
print (x_unit) # magnitude is one
print ('------')
print (np.round(y_unit, 2)) # magnitude is one
[[0.6] [0.8] [0. ]] ------ [[-0.74] [ 0.56] [ 0.37]]
# confirm magnitude of each vector equals 1 --> yes
print (np.linalg.norm(x_unit, 2))
print ('----')
print (np.round(np.linalg.norm(y_unit, 2)))
1.0 ---- 1.0
Q: special case of a matrix composed of orthonormal vectors; the matrix product with its transpose equals the identity (Q does not need to be square):
Q = np.column_stack((x_unit, y_unit))
Q
array([[ 0.6 , -0.74278135],
[ 0.8 , 0.55708601],
[ 0. , 0.37139068]])
np.round(Q.T @ Q,1)
array([[ 1., -0.],
[-0., 1.]])
Gram-Schmidt Orthogonalization Procedure: transforms a set of non-orthogonal vectors into orthogonal vectors; take the vectors of a matrix, one by one, and make each subsequent vector orthonormal to the previous one
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture92.PNG')
Check if the columns of matrix A are orthonormal: compute $A^TA$. If it is orthonormal, result = identity matrix:
A = np.array([[2, 1, -2],
[7, -3, 1],
[-3, 5, -1]])
# check if A is orthonormal --> No
A.T @ A
array([[ 62, -34, 6],
[-34, 35, -10],
[ 6, -10, 6]])
Generate $q_2$ from $a_2$:
a1 = A[:, 0]
q1 = A[:, 0]
a2 = A[:, 1]
q2 = a2 - ((q1.T @ a2)/(q1.T @ q1)) * q1 # formula
q2
array([2.09677419, 0.83870968, 3.35483871])
# solved for q2 above; q2 should be orthogonal to q1 --> verify
np.round(q2.T @ q1, 2)
-0.0
Generate $q_3$ from $a_3$:
a3 = A[:, 2]
q3 = a3 - (((q1.T @ a3)/(q1.T @ q1)) * q1) - (((q2.T @ a3)/(q2.T @ q2)) * q2) # formula
q3
array([-1.33333333, 0.66666667, 0.66666667])
# solved for q3 above; q3 should be orthogonal to q1 and q2 --> verify
print (np.round(q1 @ q3))
print ('-------')
print (np.round(q1 @ q2))
-0.0 ------- -0.0
Generate matrix Q' by having $q_1$, $q_2$, and $q_3$ be the column spans of matrix. Q' matrix has orthogonal, but not normal vectors.
Q_prime = np.column_stack((q1, q2, q3))
np.round(Q_prime)
array([[ 2., 2., -1.],
[ 7., 1., 1.],
[-3., 3., 1.]])
Matrix Q needs to have orthonormal vectors:
Q_norms = np.linalg.norm(Q_prime, 2, axis=0)
Q_norms
array([7.87400787, 4.04411161, 1.63299316])
# Q is solved
Q = Q_prime / Q_norms
np.round(Q, 2)
array([[ 0.25, 0.52, -0.82],
[ 0.89, 0.21, 0.41],
[-0.38, 0.83, 0.41]])
# verify each vector has magnitude of 1
np.linalg.norm(Q , 2, axis=0)
array([1., 1., 1.])
# verify code is correct: compute Q^T * Q = I --> Yes
np.round(Q.T @ Q, 2)
array([[ 1., -0., -0.],
[-0., 1., 0.],
[-0., 0., 1.]])
Gram-Schmidt Orthogonalization can be represented as QR Decomposition, similar to how Gaussian Elimination can be represented as LU decomposition. In QR decomposition, elementary matrices are used to perform column operations. Specifically, an upper triangular matrix is used to perform column operations in QR decomposition by multiplying A from the right side.
Decompose Matrix A into matrices Q and R.
Shortcut using qr() method:
A = np.array([[2, 1, -2],
[7, -3, 1],
[-3, 5, -1]])
# we should get same matrix using Q_qr vs. Q but we don't because the signs are flipped
# it's a stability and approximation issue to be aware of
Q_qr, R = np.linalg.qr(A)
Q_qr
array([[-0.25400025, 0.51847585, -0.81649658],
[-0.88900089, 0.20739034, 0.40824829],
[ 0.38100038, 0.82956136, 0.40824829]])
R
array([[-7.87400787, 4.31800432, -0.76200076],
[ 0. , 4.04411161, -1.65912271],
[ 0. , 0. , 1.63299316]])
# returns original A matrix
(Q_qr) @ R
array([[ 2., 1., -2.],
[ 7., -3., 1.],
[-3., 5., -1.]])
Matrix Determinant: scalar number providing critical information about the matrix. Specifically:
Determinants can also be used to find area of $R^2$ matrix (i.e. parallelogram) or volume of $R^3$ matrix. This topic plays a critical conceptual role in matrix decomposition, particularly eigenvalues and eigenvectors later on.
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture73.PNG')
Step by Step Math Problem Example:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture75.PNG')
# 2x2 matrix
A = np.array([[1, 3], [2, -3]])
A_det = np.round(np.linalg.det(A), 2)
A_abs = np.abs(np.round(np.linalg.det(A), 2))
print(f'determinant of matrix A equals: {A_det}')
print(f'area of parallelogram with column span of matrix A as vectors: {A_abs} units^2')
determinant of matrix A equals: -9.0 area of parallelogram with column span of matrix A as vectors: 9.0 units^2
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture74.PNG')
Step by Step Math Problem Example:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture76.PNG')
# 3x3 matrix
A = np.array([[1, 3, 2], [-3, -1, -3], [2, 3, 1]])
A_det=np.round(np.linalg.det(A), 2)
A_abs = np.abs(np.round(np.linalg.det(A), 2))
print(f'determinant of matrix A equals: {A_det}')
print(f'volume of parallelepid with column span of matrix A as vectors: {A_abs} units^2')
determinant of matrix A equals: -15.0 volume of parallelepid with column span of matrix A as vectors: 15.0 units^2
Determinants as Scaling Factors: can be viewed as the factor by which areas are scaled under a linear mapping. For example, if you do a mapping using a matrix and the area increases by a factor of 5, then the determinant of the transformation matrix equals 5.
Step by Step Math Problem Example:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture77.PNG')
Vertical axis scaled by 3 and horizontal axis scaled by 2 using matrix A: new area increased by factor of 6 which equals determinant of Matrix A.
A = np.array([[3, 0],
[0, 2]])
x = np.array([2,2])
A @ x.T
array([6, 4])
Eigen Related Concepts are used in machine learning programs such as markov chains, PCA, clustering, recommendation systems, etc. Some concepts being covered:
For a given vector space, n linearly independent vector(s) with n elements forms its basis. One set of basis vector(s) can be "moved" or transformed to another set of basis vector(s) with linear mappings.
Step by Step Math Problem Example:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture80.PNG')
# Ax=u
# x: new vector basis
# x=A^-1*u
a1 = np.array([[1], [1]])
a2 = np.array([[0], [1]])
u = np.array([[4], [3]])
A = np.column_stack((a1, a2))
print (A)
print ('---------')
print (u)
print ('---------')
A_inv = np.linalg.inv(A)
print (A_inv)
x = A_inv @ u
x
[[1 0] [1 1]] --------- [[4] [3]] --------- [[ 1. 0.] [-1. 1.]]
array([[ 4.],
[-1.]])
Step by Step Math Problem Example:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture81.PNG')
v = np.array([[-1], [-4]])
print (v)
print ('-------------------')
# T = transformation / mapping matrix
T = np.array([[-1/4, 1/4], [1/4, 1/4]])
print(T)
print ('-------------------')
T_inv = np.linalg.inv(T)
print (T_inv)
print ('-------------------')
print (T_inv @ v)
[[-1] [-4]] ------------------- [[-0.25 0.25] [ 0.25 0.25]] ------------------- [[-2. 2.] [ 2. 2.]] ------------------- [[ -6.] [-10.]]
Think of eigenvectors as the dominant personality trait of someone and the eigenvalue as the intensity of that trait. For an individual matrix, the eigenvector refers to the “characteristic vector” while eigenvalue refers to “characteristic value” for that vector.
Eigenvector of a Matrix: Non-zero vector that does not rotate or change direction since it can only get longer or shorter when multiplied by a transformation matrix A. There are infinitely many eigenvectors for each eigenvalue.
Eigenvalue of a Matrix: scalar multiple of the eigenvector
Example: Donald Trump's dominant personality trait is fumbling narcissism. This is his eigenvector. No matter if he is developing NYC real estate or governing in the White House, his personality revolves around it. The eigenvalue represents the scalar magnitude of his fumbling narcissism ... so off the charts.
Given square matrix A in $R^{nxn}$ space, eigenvector x, scalar λ, and identity matrix I:
(A - λI) is the eigenvalue of the matrix. Determinant of a square matrix = scalar multiple of matrix = eigenvalue so:
Since matrix A and identity matrix I are fixed we need to solve for λ value that yields a 0 determinant of the matrix (A−λI).
Step by Step Math Problem Example:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture82.PNG')
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture83.PNG')
A = np.array([[-2, -4, 2],
[-2, 1, 2],
[4, 2, 5]])
print (A)
print ('-------------------------------------------------------')
eigenvalues, eigenvectors = np.linalg.eig(A) # must use this line of code exactly
print(f'eigenvalues of matrix A are:{eigenvalues}')
print ('-------------------------------------------------------')
# code returns NORMALIZED eigenvectors
print(f'normalized eigenvectors of matrix A are:\n{eigenvectors}')
[[-2 -4 2] [-2 1 2] [ 4 2 5]] ------------------------------------------------------- eigenvalues of matrix A are:[-5. 3. 6.] ------------------------------------------------------- normalized eigenvectors of matrix A are: [[ 0.81649658 0.53452248 0.05842062] [ 0.40824829 -0.80178373 0.35052374] [-0.40824829 -0.26726124 0.93472998]]
# verify normalized eigenvectors are correct
eigenvalue6_vector = np.array([[1], [6], [16]])
print (eigenvalue6_vector)
print ('----------------')
eigenvalue6_normvector = eigenvalue6_vector * (1/(np.linalg.norm(eigenvalue6_vector, 2))) # denominator is Euclidean norm
print (eigenvalue6_normvector) # equals column 3 of matrix A
[[ 1] [ 6] [16]] ---------------- [[0.05842062] [0.35052374] [0.93472998]]
Trace of Matrix: sum of matrix's diagonal elements. Formula for square matrix $A∈R^{n×n}$:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture84.PNG')
The sum of eigenvalues equals the trace of a matrix. For example, see Matrix A from previous block of code:
A = np.array([[-2, -4, 2],
[-2, 1, 2],
[4, 2, 5]])
A
array([[-2, -4, 2],
[-2, 1, 2],
[ 4, 2, 5]])
The sum of Matrix A's diagonal elements are:
trace = A.trace()
trace
4
Trace value of 4 equals the sum of Matrix A's eigen values:
print(f'eigenvalues of matrix A are:{eigenvalues}')
np.round(sum(eigenvalues))
eigenvalues of matrix A are:[-5. 3. 6.]
4.0
Eigenvalue Decomposition (ED): Using the eigenvalue algorithm to find the eigenvalues and associated eigenvectors of a matrix; can only be performed on square matrices; decomposition may not be possible
The goal of ED is to solve for eigenvalues and associated eigenvectors using a single matrix-matrix operation. Given formula:
The goal is to multiply a matrix of eigenvalues by the matrix of eigenvectors. Due to commutative property of multiplication (order matters), use formula:
# put normalized eigenvectors in matrix X
X = eigenvectors
print (X)
print ('-------------')
# put λI in matrix A
I = np.identity(3)
print (I)
print ('-------------')
E = eigenvalues * I
print (E)
[[ 0.81649658 0.53452248 0.05842062] [ 0.40824829 -0.80178373 0.35052374] [-0.40824829 -0.26726124 0.93472998]] ------------- [[1. 0. 0.] [0. 1. 0.] [0. 0. 1.]] ------------- [[-5. 0. 0.] [-0. 3. 0.] [-0. 0. 6.]]
Given formula AX = XΛ, the left side of the equation AX equals:
print(f'AX = \n{A @ X}')
AX = [[-4.0824829 1.60356745 0.35052374] [-2.04124145 -2.40535118 2.10314246] [ 2.04124145 -0.80178373 5.60837988]]
Given formula AX = XΛ, the right side of the equation XΛ equals:
print(f'XΛ = \n{X @ E}') # order of multiplication matters!
XΛ = [[-4.0824829 1.60356745 0.35052374] [-2.04124145 -2.40535118 2.10314246] [ 2.04124145 -0.80178373 5.60837988]]
# verify equality of matrices
print (f'element wise equality? {np.allclose(A @ X, X @ E)}') # don't use equality operator for matrix comparison
element wise equality? True
Final step of eigenvalue decomposition seeks to solve for matrix A. Formula is:
$X$: “axis” around which above matrix transformation occurs; eigenvectors matrix
$Λ$: matrix that can stretch, shrink, or flip the vectors
$X^{−1}$: change basis (rotation) from standard basis into the eigenbasis
X_inv = np.linalg.inv(X)
print (f'matrix A: \n{A}')
print ('----------------')
print (f'reconstruction of matrix A using eigenvalue decomposition: \n{X @ X @ E @ X_inv}')
matrix A: [[-2 -4 2] [-2 1 2] [ 4 2 5]] ---------------- reconstruction of matrix A using eigenvalue decomposition: [[-2.46835563 -2.61462259 2.99414125] [ 2.18916584 -1.7337294 0.96554784] [ 5.08993899 3.23519188 3.32263084]]
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture86.PNG')
Eigenbasis: a set of linearly independent vectors (basis) in which every vector is an eigenvector; serves as a good basis that can simplify computations and improve understanding of the linear mapping(s).
In previous example using matrix A below, the eigenbasis equals:
A = np.array([[-2, -4, 2],
[-2, 1, 2],
[4, 2, 5]])
A
# code returns NORMALIZED eigenvectors
print(f'normalized eigenbasis:\n{eigenvectors}')
normalized eigenbasis: [[ 0.81649658 0.53452248 0.05842062] [ 0.40824829 -0.80178373 0.35052374] [-0.40824829 -0.26726124 0.93472998]]
Singular Value Decomposition (SVD): general decomposition method that can be used on all matrices including non-square and singular matrices. SVD is used to reduce a dataset containing a large number of values to a dataset containing significantly fewer values, but which still contains a large fraction of the variability present in the original data.
For any rectangular matrix $A∈R^{m×n}$, it can be decomposed as the product of an orthogonal matrix $U∈R^{m×m}$, a diagonal matrix $Σ∈R^{m×m}$, and another orthogonal matrix $X^{−1}∈R^{n×n}$:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture87.PNG')
Singular Values are the non-negative values along the diagonal of matrix Σ. They can be viewed as akin to the eigenvalues in Eigenvalue Decomposition.
There are two situations of note for matrix A with examples below:
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture88.PNG')
Geometric interpretation of the SVD: sequence of linear mappings or transformations
The main difference between SVD and Eigenvalue Decomposition is in matrix U. Instead of going back to the standard basis, U performs a change of basis onto another direction.
Illustrate effect of $A^{3x2}$ in a pair of vectors in standard basis. In the formula $A=UΣV^T$, the right orthogonal matrix (U) has 3 column vectors ... it generates the third dimension which is orthogonal to the ellipse surface.
Image(filename='C:\\Users\\Pinzhi Zhang\\LinAlg_Python_Images\\Capture90.PNG')
# 2 x 3 matrix
A_2x3 = np.array([[2, 1, 0],
[-3, 0, 1]])
print (A_2x3)
[[ 2 1 0] [-3 0 1]]
# Decomposition of 2x3 matrix
U1, S1, V_T1 = np.linalg.svd(A_2x3)
print(f'Left orthogonal 2x3 matrix A:\n{np.round(U1, 2)}\n')
print(f'Singular values diagonal 2x3 matrix A:\n{np.round(S1, 2)}\n')
print(f'Right orthogonal 2x3 matrix A:\n{np.round(V_T1, 2)}')
Left orthogonal 2x3 matrix A: [[-0.55 0.83] [ 0.83 0.55]] Singular values diagonal 2x3 matrix A: [3.74 1. ] Right orthogonal 2x3 matrix A: [[-0.96 -0.15 0.22] [-0. 0.83 0.55] [ 0.27 -0.53 0.8 ]]
# 3 x 2 matrix
A_3x2 = np.array([[2, 1],
[-3, 0],
[0, 2]])
print (A_3x2)
[[ 2 1] [-3 0] [ 0 2]]
# Decomposition of 3x2 matrix
U2, S2, V_T2 = np.linalg.svd(A_3x2)
print(f'Left orthogonal matrix for 3x2 matrix A:\n{np.round(U2, 2)}\n')
print(f'Singular values diagonal matrix for 3x2 matrix A:\n{np.round(S2, 2)}\n')
print(f'Right orthogonal matrix for 3x2 matrix A:\n{np.round(V_T2, 2)}')
Left orthogonal matrix for 3x2 matrix A: [[-0.59 -0.24 -0.77] [ 0.8 -0.32 -0.51] [-0.13 -0.91 0.38]] Singular values diagonal matrix for 3x2 matrix A: [3.67 2.13] Right orthogonal matrix for 3x2 matrix A: [[-0.97 -0.23] [ 0.23 -0.97]]
## delete later
# 3 x 2 matrix
A_3x2 = np.array([[1, 0],
[0, 1],
[1, 1]])
print (A_3x2)
[[1 0] [0 1] [1 1]]
# Decomposition of 3x2 matrix
U2, S2, V_T2 = np.linalg.svd(A_3x2)
print(f'Left orthogonal matrix for 3x2 matrix A:\n{np.round(U2, 2)}\n')
print(f'Singular values diagonal matrix for 3x2 matrix A:\n{np.round(S2, 2)}\n')
print(f'Right orthogonal matrix for 3x2 matrix A:\n{np.round(V_T2, 2)}')
Left orthogonal matrix for 3x2 matrix A: [[-0.41 0.71 -0.58] [-0.41 -0.71 -0.58] [-0.82 -0. 0.58]] Singular values diagonal matrix for 3x2 matrix A: [1.73 1. ] Right orthogonal matrix for 3x2 matrix A: [[-0.71 -0.71] [ 0.71 -0.71]]
#################################################
# 3 x 3 matrix: col 3 equals 2 x col 1
A_3x3 = np.array([[2, 1, 4],
[-3, 0, -6],
[1, 2, 2]])
print (A_3x3)
[[ 2 1 4] [-3 0 -6] [ 1 2 2]]
# Decomposition of 3x3 matrix
U3, S3, V_T3 = np.linalg.svd(A_3x3)
print(f'Left orthogonal matrix for 3x3 matrix A:\n{np.round(U3, 2)}\n')
print(f'Singular values diagonal matrix for 3x3 matrix A:\n{np.round(S3, 2)}')
print (f'Note the third singular value equals 0 which means the third column contains redundant information. \n')
print(f'Right orthogonal matrix for 3x3 matrix A:\n{np.round(V_T3, 2)}')
Left orthogonal matrix for 3x3 matrix A: [[-0.54 -0.2 -0.82] [ 0.79 -0.46 -0.41] [-0.29 -0.86 0.41]] Singular values diagonal matrix for 3x3 matrix A: [8.44 1.95 0. ] Note the third singular value equals 0 which means the third column contains redundant information. Right orthogonal matrix for 3x3 matrix A: [[-0.44 -0.13 -0.89] [ 0.06 -0.99 0.12] [ 0.89 0. -0.45]]
One common way to overcome the issue of real world applications with millions of rows and columns is to utilize low-rank approximations of the original matrices. Low rank means using a subset of orthogonal vectors instead of the full set of orthogonal vectors so that we can obtain a “reasonably” good approximation of the original matrix.
Low-rank approximations are possible because most data points can be computed as linear combinations of a subset of orthogonal vectors. Specific examples include principal component analysis, factor analysis, and general dimensionality reduction techniques.
Important things to remember: